Query for less data on pagination count by rhysyngsun · Pull Request #3106 · mitodl/mit-learn

rhysyngsun · 2026-03-25T21:30:13Z

What are the relevant tickets?

Description (What does it do?)

Refactors our pagination to have just 2 main implementations.
Sets pagination class default and opts-out noncompliant views explicitly to maintain their existing interfaces.
Updates the default pagination implementation to only query for the pk by default (this can be overridden if needed by subclassing).

The effect of this to turn queries like this that DRF previously ran to determine count (this example is for the /api/v1/featured/ endpoint):

SELECT COUNT(*)
FROM (
  SELECT DISTINCT "learning_resources_learningresource"."id" AS "col1",
    "learning_resources_learningresource"."created_on" AS "col2",
    "learning_resources_learningresource"."updated_on" AS "col3",
    "learning_resources_learningresource"."readable_id" AS "col4",
    "learning_resources_learningresource"."title" AS "col5",
    "learning_resources_learningresource"."description" AS "col6",
    "learning_resources_learningresource"."full_description" AS "col7",
    "learning_resources_learningresource"."last_modified" AS "col8",
    "learning_resources_learningresource"."published" AS "col9",
    "learning_resources_learningresource"."languages" AS "col10",
    "learning_resources_learningresource"."url" AS "col11",
    "learning_resources_learningresource"."image_id" AS "col12",
    "learning_resources_learningresource"."platform_id" AS "col13",
    "learning_resources_learningresource"."certification" AS "col14",
    "learning_resources_learningresource"."certification_type" AS "col15",
    "learning_resources_learningresource"."resource_type" AS "col16",
    "learning_resources_learningresource"."resource_category" AS "col17",
    "learning_resources_learningresource"."ocw_topics" AS "col18",
    "learning_resources_learningresource"."offered_by_id" AS "col19",
    "learning_resources_learningresource"."etl_source" AS "col20",
    "learning_resources_learningresource"."professional" AS "col21",
    "learning_resources_learningresource"."next_start_date" AS "col22",
    "learning_resources_learningresource"."prices" AS "col23",
    "learning_resources_learningresource"."availability" AS "col24",
    "learning_resources_learningresource"."completeness" AS "col25",
    "learning_resources_learningresource"."delivery" AS "col26",
    "learning_resources_learningresource"."license_cc" AS "col27",
    "learning_resources_learningresource"."test_mode" AS "col28",
    "learning_resources_learningresource"."continuing_ed_credits" AS "col29",
    "learning_resources_learningresource"."pace" AS "col30",
    "learning_resources_learningresource"."format" AS "col31",
    "learning_resources_learningresource"."location" AS "col32",
    "learning_resources_learningresource"."duration" AS "col33",
    "learning_resources_learningresource"."min_weeks" AS "col34",
    "learning_resources_learningresource"."max_weeks" AS "col35",
    "learning_resources_learningresource"."time_commitment" AS "col36",
    "learning_resources_learningresource"."min_weekly_hours" AS "col37",
    "learning_resources_learningresource"."max_weekly_hours" AS "col38",
    "learning_resources_learningresource"."require_summaries" AS "col39", COUNT(
    "learning_resources_learningresourceviewevent"."id") AS "_views_count",
    "learning_resources_learningresourcerelationship"."position" AS "position"
  FROM "learning_resources_learningresource"
  LEFT OUTER JOIN "learning_resources_learningresourceviewevent" ON (
    "learning_resources_learningresource"."id" =
    "learning_resources_learningresourceviewevent"."learning_resource_id")
  INNER JOIN "learning_resources_learningresourcerelationship" ON (
    "learning_resources_learningresource"."id" =
    "learning_resources_learningresourcerelationship"."child_id")
  WHERE (
    "learning_resources_learningresourcerelationship"."parent_id" IN (
      SELECT U0."featured_list_id"
      FROM "channels_channel" U0
      WHERE U0."channel_type" = 'unit'
    ) AND "learning_resources_learningresource"."published"
  )

into a query like this:

SELECT COUNT(*) 
FROM 
  (
    SELECT 
      DISTINCT "learning_resources_learningresource"."id" AS "col1", 
      "learning_resources_learningresourcerelationship"."position" AS "position" 
    FROM "learning_resources_learningresource" INNER JOIN "learning_resources_learningresourcerelationship" ON (
        "learning_resources_learningresource"."id" = "learning_resources_learningresourcerelationship"."child_id"
      ) 
    WHERE 
      (
        "learning_resources_learningresourcerelationship"."parent_id" IN (
          SELECT 
            U0."featured_list_id" 
          FROM "channels_channel" U0 
          WHERE U0."channel_type" = 'unit'
        ) 
        AND "learning_resources_learningresource"."published"
      )
  ) subquery;

The difference in performance between these two specific queries is ~1000x faster, going from ~500ms for the version currently in main to 0.3ms for this branch. I don't expect to see exactly the same improvement across the board, but there should be some kind of improvement because the count query won't be going to disk as much for data and ideally just hits a few indices now.

How can this be tested?

The app should still function. You likely won't see a huge difference in performance locally unless you have production-scale data around learning resources.

github-actions · 2026-03-25T21:30:32Z

OpenAPI Changes

No detectable change.

View full changelog

Unexpected changes? Ensure your branch is up-to-date with main (consider rebasing).

arslanashraf7 · 2026-03-30T06:11:59Z

@zamanafzal could you review this?

Copilot

Pull request overview

This PR centralizes DRF pagination behavior and optimizes pagination count queries by counting against a reduced field set (defaulting to pk) to avoid expensive COUNT(DISTINCT <many columns>) subqueries on complex querysets.

Changes:

Introduces main.pagination.DefaultPagination (and LargePagination) and sets it as the global DRF DEFAULT_PAGINATION_CLASS.
Removes per-app pagination classes/overrides in favor of the global default, and explicitly opts out (pagination_class = None) for endpoints that must preserve non-paginated responses.
Adjusts the learning-resources summary action queryset to avoid incompatible select_related() behavior and reduce selected columns.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
video_shorts/views.py	Switches video shorts pagination to subclass the new global default pagination.
profiles/views.py	Explicitly disables pagination for the staff user admin endpoint to preserve its response shape.
news_events/views.py	Removes the app-local pagination implementation and relies on the global default pagination.
main/settings.py	Sets DRF `DEFAULT_PAGINATION_CLASS` to `main.pagination.DefaultPagination`.
main/pagination.py	Adds centralized `DefaultPagination`/`LargePagination` and overrides `get_count()` to count with fewer selected fields.
learning_resources_search/views.py	Disables pagination for the user subscription endpoint to preserve its response shape.
learning_resources/views.py	Removes local pagination classes, adopts centralized pagination, and updates the summary action queryset.
channels/views.py	Relies on global pagination for channel lists and disables pagination on non-list-like endpoints.
articles/views.py	Removes the app-local pagination implementation and relies on the global default pagination.

Copilot · 2026-03-30T08:48:09Z

learning_resources/views.py

        """
        queryset = self.filter_queryset(
-            self.get_queryset().values("id", "last_modified")
+            # we don't use `self.get_queryset()` here because there are incomplatible


Typo in comment: "incomplatible" should be "incompatible".

Suggested change

# we don't use `self.get_queryset()` here because there are incomplatible

# we don't use `self.get_queryset()` here because there are incompatible

Copilot · 2026-03-30T08:48:10Z

learning_resources/views.py

 )
 from main.constants import VALID_HTTP_METHODS
 from main.filters import MultipleOptionsFilterBackend
+from main.pagination import LargePagination


Removing DefaultPagination/LargePagination from learning_resources.views breaks existing imports (e.g., testimonials/views.py imports LargePagination from learning_resources.views). Update those imports/usages to point to main.pagination (or re-export from learning_resources.views) to avoid ImportError at runtime.

Suggested change

from main.pagination import LargePagination

from main.pagination import DefaultPagination, LargePagination

I think we need to make this change at

mit-learn/testimonials/views.py

Line 7 in 088cac9

from learning_resources.views import LargePagination

Copilot · 2026-03-30T08:48:10Z

learning_resources/views.py

+            # we don't use `self.get_queryset()` here because there are incomplatible
+            # `select_related()` invocations and we don't need related data anyway
+            LearningResource.objects.filter(published=True).only("id", "last_modified")
        )


The summary action now builds its own base queryset without .distinct(). Because LearningResource filters include many-to-many relations (e.g., topics/departments), filtering can introduce duplicate rows unless the queryset is made distinct. Consider applying .distinct() after filter_queryset (or otherwise ensuring uniqueness) to preserve correct counts/results.

Suggested change

)

).distinct()

zamanafzal

@rhysyngsun Thanks for the PR. The pagination count optimisation looks useful. I've left a few comments on the PR. Do you think they make sense?

zamanafzal · 2026-03-30T09:21:04Z

learning_resources/views.py

 )
 from main.constants import VALID_HTTP_METHODS
 from main.filters import MultipleOptionsFilterBackend
+from main.pagination import LargePagination


I think we need to make this change at

mit-learn/testimonials/views.py

Line 7 in 088cac9

from learning_resources.views import LargePagination

zamanafzal · 2026-03-30T10:25:30Z

learning_resources/views.py

-            self.get_queryset().values("id", "last_modified")
+            # we don't use `self.get_queryset()` here because there are incomplatible
+            # `select_related()` invocations and we don't need related data anyway
+            LearningResource.objects.filter(published=True).only("id", "last_modified")


The changes to LearningResourceViewSet.summary() appear to bypass self.get_queryset() and the original deduplication logic. Since summary is used for sitemaps, could we add a test (or at least do a manual check) to ensure the count and the returned results stay consistent with the main list endpoint for the same filters (i.e., no duplicate-driven count inflation or paging gaps)?

zamanafzal · 2026-03-30T10:26:08Z

main/settings.py

    "DEFAULT_AUTHENTICATION_CLASSES": (
        "rest_framework.authentication.SessionAuthentication",
    ),
+    "DEFAULT_PAGINATION_CLASS": "main.pagination.DefaultPagination",


Since we're now setting DEFAULT_PAGINATION_CLASS in settings, some endpoints might switch from returning a plain list ([]) to a full paginated response ({count, next, previous, results}).
Even with the pagination_class = None you added in a few places, do you think we should do a quick audit of all list endpoints to make sure we haven't accidentally changed the response shape for any frontend consumers?

(doesn't hurt to check, but this is the sort of thing our OpenAPI CI check should actualy be very good at catching)

mbertrand

Looks good, agree with other comments about:

adding distinct() to the LearnignResourceViewSet summary view and tests to make sure counts are as expected with no dupes
updating the LargePagination import for testimonials

zamanafzal

@rhysyngsun The changes look good to me. The only thing missing is tests to verify the LearningResourceViewSet summary view to make sure counts are as expected. If you think tests aren’t necessary, then we’re good to proceed with these changes.

rhysyngsun changed the title ~~Nl/pagination tweaks~~ Query for less data on pagination count Mar 25, 2026

rhysyngsun force-pushed the nl/pagination-tweaks branch 2 times, most recently from d604868 to 9731e2c Compare March 27, 2026 20:37

rhysyngsun marked this pull request as ready for review March 27, 2026 20:49

zamanafzal requested a review from Copilot March 30, 2026 08:43

Copilot started reviewing on behalf of zamanafzal March 30, 2026 08:43 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

zamanafzal requested changes Mar 30, 2026

View reviewed changes

zamanafzal added the Waiting on author label Mar 30, 2026

mbertrand reviewed Mar 30, 2026

View reviewed changes

rhysyngsun added 3 commits March 30, 2026 12:36

Update DRF pagination to only use limited fields

3042eef

Fix the summary endpoint

c84beab

Feedback

03f664d

rhysyngsun force-pushed the nl/pagination-tweaks branch from 62fb2f7 to 03f664d Compare March 30, 2026 16:37

zamanafzal approved these changes Mar 31, 2026

View reviewed changes

	# we don't use `self.get_queryset()` here because there are incomplatible
	# we don't use `self.get_queryset()` here because there are incompatible

	from main.pagination import LargePagination
	from main.pagination import DefaultPagination, LargePagination

Conversation

rhysyngsun commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What are the relevant tickets?

Description (What does it do?)

How can this be tested?

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

OpenAPI Changes

Uh oh!

arslanashraf7 commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

zamanafzal Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

zamanafzal left a comment

Choose a reason for hiding this comment

Uh oh!

zamanafzal Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

zamanafzal Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

zamanafzal Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

ChristopherChudzicki Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

mbertrand left a comment

Choose a reason for hiding this comment

Uh oh!

zamanafzal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

rhysyngsun commented Mar 25, 2026 •

edited

Loading

github-actions bot commented Mar 25, 2026 •

edited

Loading